OCPBUGS-62144: Makes sure rendering for egress IP reachabilityTimeout triggers restart of ovnkube (node/control) pods#2955
Conversation
|
@raphaelvrosa: This pull request references Jira Issue OCPBUGS-62144, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThese changes introduce runtime configuration support for Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: raphaelvrosa The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/network/ovn_kubernetes_test.go (1)
3951-3979: Add a same-value rerender case to verify hash stability.The table only checks
nil -> 0 -> 10. Add a repeated value case (for example,10again with a new pointer) and assert hashes do not change, so this test catches non-deterministic hash inputs.Suggested test case addition
{ name: "Reachability timeout changed to 10", reachabilityTimeout: ptrToUint32(10), expectHashChanged: true, expectKubernetesFeatureReachability: true, expectErr: false, }, + { + name: "Reachability timeout remains 10", + reachabilityTimeout: ptrToUint32(10), + expectHashChanged: false, + expectKubernetesFeatureReachability: true, + expectErr: false, + }, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 3951 - 3979, Add a new test case in the testCases slice used in ovn_kubernetes_test.go that repeats the same value for reachabilityTimeout (e.g., use ptrToUint32(10) again with a freshly allocated pointer) and set expectHashChanged to false and expectKubernetesFeatureReachability to true; this ensures the code path that computes the hash (referenced by the testCases slice, reachabilityTimeout, and expectHashChanged) verifies stability across a rerender with identical logical values even when pointer instances differ.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/network/ovn_kubernetes.go`:
- Around line 507-509: The hash currently includes pointer addresses because
data.Data["ReachabilityTotalTimeoutSeconds"] is a *uint32; update the formatting
in the nodeNoOverlayHashData construction (and the similar hash at lines
~519–522) to dereference the pointer and use a stable scalar representation
(e.g., the uint32 value with %d or %v) instead of the pointer itself, and guard
with a nil-check to avoid panics when the pointer is nil; locate the
nodeNoOverlayHashData assembly and the other hash creation that reference
data.Data["ReachabilityTotalTimeoutSeconds"] and replace the pointer formatting
with the dereferenced value.
---
Nitpick comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 3951-3979: Add a new test case in the testCases slice used in
ovn_kubernetes_test.go that repeats the same value for reachabilityTimeout
(e.g., use ptrToUint32(10) again with a freshly allocated pointer) and set
expectHashChanged to false and expectKubernetesFeatureReachability to true; this
ensures the code path that computes the hash (referenced by the testCases slice,
reachabilityTimeout, and expectHashChanged) verifies stability across a rerender
with identical logical values even when pointer instances differ.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a52ef0ea-c53e-4daf-b648-e9d0a4a9457a
📒 Files selected for processing (3)
README.mdpkg/network/ovn_kubernetes.gopkg/network/ovn_kubernetes_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/network/ovn_kubernetes_test.go (1)
4034-4037: Avoid forcing HostedControlPlane in a generic reachability/hash rollout testLine 4034-Line 4037 forces the HyperShift path, which can under-cover the default self-hosted rendering path this PR targets. Prefer default infra here, or add explicit table coverage for both hosted and self-hosted modes.
Minimal scope fix
- // Set is as Hypershift hosted control plane. - bootstrapResult.Infra = bootstrap.InfraStatus{} - bootstrapResult.Infra.HostedControlPlane = &hypershift.HostedControlPlane{} + // Keep default self-hosted path in this test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 4034 - 4037, The test is forcing the HyperShift path by setting bootstrapResult.Infra.HostedControlPlane, which skips the default self-hosted rendering path; update the test so it leaves bootstrapResult.Infra as the default (do not set HostedControlPlane) or convert this case into table-driven tests that explicitly include both scenarios (one case with bootstrapResult.Infra.HostedControlPlane set and one with it unset) and assert renderOVNKubernetes outputs for each; locate the setup around bootstrapResult.Infra and the call to renderOVNKubernetes to make this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 4059-4067: The test currently only checks for the presence of the
key in ovnkubeConf but should assert the exact rendered timeout value; update
the branch that runs when tc.expectKubernetesFeatureReachability is true to
assert that ovnkubeConf contains the full key=value pair (e.g.
"egressip-reachability-total-timeout=<expected>" ) using the expected value from
the test case (add or use a field on tc such as
tc.egressIPReachabilityTotalTimeout or tc.egressipReachabilityTotalTimeout) so
the test fails if the timeout is rendered incorrectly.
---
Nitpick comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 4034-4037: The test is forcing the HyperShift path by setting
bootstrapResult.Infra.HostedControlPlane, which skips the default self-hosted
rendering path; update the test so it leaves bootstrapResult.Infra as the
default (do not set HostedControlPlane) or convert this case into table-driven
tests that explicitly include both scenarios (one case with
bootstrapResult.Infra.HostedControlPlane set and one with it unset) and assert
renderOVNKubernetes outputs for each; locate the setup around
bootstrapResult.Infra and the call to renderOVNKubernetes to make this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3a133127-6819-4195-9d53-9a25368cdfe5
📒 Files selected for processing (2)
pkg/network/ovn_kubernetes.gopkg/network/ovn_kubernetes_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/network/ovn_kubernetes.go
|
Fix in some CI tests depend on: https://github.com/openshift/origin/pull/30961/changes |
/jira refresh |
|
/retest-required |
|
@raphaelvrosa: This pull request references Jira Issue OCPBUGS-62144, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: huiran0826. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@raphaelvrosa: This pull request references Jira Issue OCPBUGS-62144, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: huiran0826. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest-required |
arghosh93
left a comment
There was a problem hiding this comment.
You will have to find a way to render timeout values to both ovnkube-node and ovnkube-control-plane, and also make sure that both gets redeployed. When ovnkube gets to know about the new timeout value either through an environment variable or through CLI flag, then it can take care of setting things up on the ovnkube side, as implementation for that is already in place.
|
@raphaelvrosa: This pull request references Jira Issue OCPBUGS-62144, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
pkg/network/ovn_kubernetes_test.go (2)
4024-4027: Add a self-hosted render pass for reachability timeout coverage.The test currently forces HyperShift/managed rendering only. Since both managed and self-hosted control-plane templates were changed, add a non-hosted case to cover the self-hosted path as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 4024 - 4027, The test currently only sets bootstrapResult.Infra.HostedControlPlane to a non-nil hypershift.HostedControlPlane and calls renderOVNKubernetes, which skips the self-hosted path; add an additional render pass where bootstrapResult.Infra.HostedControlPlane is nil (or unset) to simulate a self-hosted control plane and call renderOVNKubernetes(config, bootstrapResult, manifestDirOvn, fakeClient, featureGatesCNO) again so the self-hosted templates and reachability timeout path are exercised and asserted.
3955-3955:expectErris dead in the current table.All cases set
expectErr: false, so the error branch is never exercised. Either add failing cases or remove the field/branch to keep the test focused.Also applies to: 4028-4031
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` at line 3955, The table-driven test contains an unused expectErr field (all entries set to false) so the error branch is never exercised; either add one or more negative test rows with expectErr: true to cover the error path or remove the expectErr field and the conditional branch that checks it to simplify the test. Locate the table (the variable containing entries with expectErr) and the code that branches on expectErr in ovn_kubernetes_test.go, then update the table to include failing cases that exercise the error behavior or delete the expectErr column and the associated if/else error handling in the test loop to keep the test focused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml`:
- Around line 159-164: The template control lines for
ReachabilityTotalTimeoutSeconds are not indented to the script block level and
can break YAML parsing; adjust the two Go-template lines (the opening "{{- if
.ReachabilityTotalTimeoutSeconds }}" and the closing "{{- end }}") so they are
indented to the same column as the shell script lines (e.g., prefix with the
same spaces as the surrounding `ovn_eip_reachability_timeout_opt=` and `if [[
... ]]; then` lines), leaving the inner lines unchanged; ensure the variable
name `.ReachabilityTotalTimeoutSeconds` and the shell variable
`ovn_eip_reachability_timeout_opt` remain exactly as in the diff.
In `@bindata/network/ovn-kubernetes/self-hosted/ovnkube-control-plane.yaml`:
- Around line 106-111: The template directives {{- if
.ReachabilityTotalTimeoutSeconds }} and {{- end }} are currently flush-left and
sit outside the shell script block causing YAML parse errors; move/indent both
directives so they align with the surrounding script body under the command: |
block (so the entire if/fi block that sets ovn_eip_reachability_timeout_opt and
references .ReachabilityTotalTimeoutSeconds is indented the same as the other
shell lines), ensuring the opening {{- if ... }} and closing {{- end }} wrap the
internal lines and preserve shell indentation.
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 3951-4096: The test currently verifies rendered ConfigMap and
command but omits asserting that pod-template hash annotations change (or remain
stable) when ReachabilityTotalTimeoutSeconds changes; update the test around
renderOVNKubernetes to capture the pod-template-hash (or the config-hash
annotation) from the ovnkube-control-plane Deployment and the ovnkube-node
DaemonSet (look up objects by GetKind()== "Deployment"/"DaemonSet" and
GetName()=="ovnkube-control-plane"/"ovnkube-node"), store the pre-change
annotation value, re-render (or compare between cases) and assert that the
annotation value mutates when tc.reachabilityTimeout changes and stays equal
when it is unchanged; use the existing variables/config flow
(renderOVNKubernetes, ovnkube-config ConfigMap, scriptCP,
tc.reachabilityTimeout, tc.expectKubernetesFeatureReachability) to locate where
to insert these assertions.
---
Nitpick comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 4024-4027: The test currently only sets
bootstrapResult.Infra.HostedControlPlane to a non-nil
hypershift.HostedControlPlane and calls renderOVNKubernetes, which skips the
self-hosted path; add an additional render pass where
bootstrapResult.Infra.HostedControlPlane is nil (or unset) to simulate a
self-hosted control plane and call renderOVNKubernetes(config, bootstrapResult,
manifestDirOvn, fakeClient, featureGatesCNO) again so the self-hosted templates
and reachability timeout path are exercised and asserted.
- Line 3955: The table-driven test contains an unused expectErr field (all
entries set to false) so the error branch is never exercised; either add one or
more negative test rows with expectErr: true to cover the error path or remove
the expectErr field and the conditional branch that checks it to simplify the
test. Locate the table (the variable containing entries with expectErr) and the
code that branches on expectErr in ovn_kubernetes_test.go, then update the table
to include failing cases that exercise the error behavior or delete the
expectErr column and the associated if/else error handling in the test loop to
keep the test focused.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ca927446-8b07-4d38-a340-286bb39e9962
📒 Files selected for processing (4)
bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-control-plane.yamlpkg/network/ovn_kubernetes.gopkg/network/ovn_kubernetes_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/network/ovn_kubernetes.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/network/ovn_kubernetes_test.go (1)
3951-4118:⚠️ Potential issue | 🟠 MajorMissing regression guard for restart trigger (template/hash mutation).
The test validates rendering, but it still does not assert that
ovnkube-nodeandovnkube-control-planepod templates mutate when timeout changes (and stay stable when unchanged). That is the core behavior this bug fix must protect.Suggested hardening
func TestRenderOVNKubernetesReachability(t *testing.T) { g := NewGomegaWithT(t) + var prevNodeTemplate map[string]interface{} + var prevControlPlaneTemplate map[string]interface{} + var prevTimeout *uint32 - for _, tc := range testCases { + for i, tc := range testCases { t.Run(tc.name, func(t *testing.T) { ... objs, _, err := renderOVNKubernetes(config, bootstrapResult, manifestDirOvn, fakeClient, featureGatesCNO) ... + renderedNode := findInObjs("apps", "DaemonSet", "ovnkube-node", "openshift-ovn-kubernetes", objs) + g.Expect(renderedNode).NotTo(BeNil()) + nodeTemplate, found, err := uns.NestedMap(renderedNode.Object, "spec", "template") + g.Expect(err).NotTo(HaveOccurred()) + g.Expect(found).To(BeTrue()) + + renderedControlPlane := findInObjs("apps", "Deployment", "ovnkube-control-plane", "openshift-ovn-kubernetes", objs) + g.Expect(renderedControlPlane).NotTo(BeNil()) + controlPlaneTemplate, found, err := uns.NestedMap(renderedControlPlane.Object, "spec", "template") + g.Expect(err).NotTo(HaveOccurred()) + g.Expect(found).To(BeTrue()) + + if i > 0 { + timeoutChanged := (prevTimeout == nil) != (tc.reachabilityTimeout == nil) || + (prevTimeout != nil && tc.reachabilityTimeout != nil && *prevTimeout != *tc.reachabilityTimeout) + g.Expect(!reflect.DeepEqual(nodeTemplate, prevNodeTemplate)).To(Equal(timeoutChanged), + "ovnkube-node pod template mutation should track reachability timeout changes") + g.Expect(!reflect.DeepEqual(controlPlaneTemplate, prevControlPlaneTemplate)).To(Equal(timeoutChanged), + "ovnkube-control-plane pod template mutation should track reachability timeout changes") + } + prevNodeTemplate = nodeTemplate + prevControlPlaneTemplate = controlPlaneTemplate + prevTimeout = tc.reachabilityTimeout + if tc.expectKubernetesFeatureReachability { ... } }) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 3951 - 4118, The test lacks assertions that the ovnkube-control-plane and ovnkube-node pod templates actually mutate when the reachability timeout changes (and remain unchanged when the timeout is unchanged); update the test to record the previous rendered pod-template artifacts (use the extracted scriptCP and scriptNode strings or compute a short hash of them) across iterations and then: when a test case changes the timeout value from the previous case assert scriptCP/scriptNode have changed, when the timeout is the same assert they are equal. Use the existing identifiers (testCases, scriptCP, scriptNode, ovnkube-control-plane, ovnkube-node) to locate where to capture and compare previous values and add assertions for mutation vs stability accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 3951-4118: The test lacks assertions that the
ovnkube-control-plane and ovnkube-node pod templates actually mutate when the
reachability timeout changes (and remain unchanged when the timeout is
unchanged); update the test to record the previous rendered pod-template
artifacts (use the extracted scriptCP and scriptNode strings or compute a short
hash of them) across iterations and then: when a test case changes the timeout
value from the previous case assert scriptCP/scriptNode have changed, when the
timeout is the same assert they are equal. Use the existing identifiers
(testCases, scriptCP, scriptNode, ovnkube-control-plane, ovnkube-node) to locate
where to capture and compare previous values and add assertions for mutation vs
stability accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 0eec8f20-1828-4d68-b4e7-343340c7e175
📒 Files selected for processing (2)
bindata/network/ovn-kubernetes/common/008-script-lib.yamlpkg/network/ovn_kubernetes_test.go
|
/jira refresh |
|
@raphaelvrosa: This pull request references Jira Issue OCPBUGS-62144, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (huirwang@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
1 similar comment
|
/retest-required |
574491b to
4b2b122
Compare
arghosh93
left a comment
There was a problem hiding this comment.
Please remove https://github.com/raphaelvrosa/cluster-network-operator/blob/master/bindata/network/ovn-kubernetes/self-hosted/004-config.yaml#L40 if thats appropriate.
a52917e to
b819edf
Compare
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
|
||
| ovn_eip_reachability_timeout_opt= | ||
| {{- if .ReachabilityTotalTimeoutSeconds }} | ||
| if [[ "{{.ReachabilityTotalTimeoutSeconds}}" != "" ]]; then |
There was a problem hiding this comment.
I think you can simply do it like below:
node_mgmt_port_netdev_flags=
if [[ -n "${ReachabilityTotalTimeoutSeconds}" ]] ; then
ovn_eip_reachability_timeout_opt="--egressip-reachability-total-timeout ${ReachabilityTotalTimeoutSeconds}"
fi
There was a problem hiding this comment.
The first check {{- if .ReachabilityTotalTimeoutSeconds }} is important to validate if the value is present (not nil) in the values of the spec template.
The inner verification cannot validate the difference between the nil value and the "zero" value.
There was a problem hiding this comment.
Do you think the suggestion I gave would not work in this case? Can you see how it is done for similar variables in the same file?
There was a problem hiding this comment.
No, it does not work.
It won't pass the test case TestRenderOVNKubernetesReachability/No_reachability_timeout_(nil) .
You can try it, remove the {{ if ... }} {{ end }} statements in 008-script-lib.yaml and run the tests go test -v -count=1 ./pkg/network/ -run TestRenderOVNKubernetesReachability.
In that case, when the value of the setting is nil (not present) it is going to set the keyword in the args of the command. This should not happen. So the test fails.
From my understanding, as this is a integer the nil value is considered as a zero value, so it adds the flag (the test checks if the flag is added and if it contains a value). By doing the check of the nil value first it makes sure the zero value is considered in the inner if statement.
I.e., statements like if [[ {{.ReachabilityTotalTimeoutSeconds}} ]]; then or if [[ "{{.ReachabilityTotalTimeoutSeconds}}" ]]; then do not work either (the test fails).
Makes sense?
|
|
||
| ovn_eip_reachability_timeout_opt= | ||
| {{- if .ReachabilityTotalTimeoutSeconds }} | ||
| if [[ "{{.ReachabilityTotalTimeoutSeconds}}" != "" ]]; then |
|
|
||
| ovn_eip_reachability_timeout_opt= | ||
| {{- if .ReachabilityTotalTimeoutSeconds }} | ||
| if [[ "{{.ReachabilityTotalTimeoutSeconds}}" != "" ]]; then |
| g.Expect(err).NotTo(HaveOccurred()) | ||
|
|
||
| // Find the ovnkube-config ConfigMap and check the template data | ||
| var configMap *uns.Unstructured |
There was a problem hiding this comment.
why we are checking for the configmap? Did you consider removing https://github.com/raphaelvrosa/cluster-network-operator/blob/master/bindata/network/ovn-kubernetes/self-hosted/004-config.yaml#L40 ?
There was a problem hiding this comment.
The goal is just to reboot the pods in this PR.
Besides, the goal is to keep all the configs consistent, so there is no reason to remove the value from the ConfigMap.
There was a problem hiding this comment.
Can you tell me what is the point of keeping it in the configmap? Earlier the configuration was being added to the configmap but redeployment was not being triggered. Now to redeploy, we are going to feed this config to ovnkube using cli. so, why dont we get rid of adding it to the configmap?
There was a problem hiding this comment.
Makes sense. But that is not in the scope of the bug.
Now it is removed from the config map and the tests are changed accordingly.
| } | ||
|
|
||
| // TestRenderOVNKubernetesReachability tests egress IP reachability timeout rendering | ||
| func TestRenderOVNKubernetesReachability(t *testing.T) { |
There was a problem hiding this comment.
if we are doing this, should we do this for all features and not only for egressIP reachability timeout? We probably cant afford to have similar block for all features. WDYT?
There was a problem hiding this comment.
The goal of this PR is to trigger the reboot of ovnkube-* pods. Adding the verification of rendering for all the features in CNO is another topic. There are tests checking the rendering, but I guess not all the dynamic/run-time changes.
There was a problem hiding this comment.
Could you do a follow-up PR, please?
There was a problem hiding this comment.
Ok, I'm opening a bug ticket and do a follow up PR.
73a3800 to
9f32ff2
Compare
9f32ff2 to
73a3800
Compare
Make sure the spec templates of ovnkube node and control-plane pods take into account the changes in the ReachabilityTotalTimeoutSeconds setting. This is implemented by rendering the cli flags in their spec template command. When this value is changed, it then triggers the restart of those pods, reloading the parameter in their configuration. Adds test cases to evaluate the rendering and the changes in ovnkube node and control pod specs. Updates README to document that reachabilityTimeoutSeconds is enabled for runtime updates. Signed-off-by: Raphael Rosa <raprosa@redhat.com>
73a3800 to
607973d
Compare
|
/retest-required |
1 similar comment
|
/retest-required |
Given ovnkube-node and ovnkube-control-plane render this setting ReachabilityTotalTimeoutSeconds in their cli flags now, the field is no longer needed in ovnkube-config config map. Signed-off-by: Raphael Rosa <raprosa@redhat.com>
|
@raphaelvrosa: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Issue: EgressIP
ReachabilityTotalTimeoutSecondsis rendered by ovnkube-config (configmap), but ovnkube node and control pods are not restarted to have the new value applied in their settings.Solution: Make sure the spec templates of ovnkube node and control-plane pods take into account the changes in the
ReachabilityTotalTimeoutSecondssetting. This is implemented by rendering the cli flags in their spec template command. When this value is changed, it then triggers the restart of those pods, reloading the parameter in their configuration.Adds test cases to evaluate the rendering and the hash changes in ovnkube node and control pod specs.
Updates README to document that reachabilityTimeoutSeconds is enabled for runtime updates.
Summary by CodeRabbit
New Features
Tests
Documentation